首页> 外文OA文献 >Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature
【2h】

Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature

机译:通过生物医学文献中的基因注释语句,MeSH和GO关键字进行功能基因聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical ‘baseline’ set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns.
机译:基因功能注释仍然是现代生物学中的关键挑战。对于高通量技术,例如基因表达实验,尤其如此。有关基因的重要信息可以全文或摘要的形式从生物医学文献中以电子方式获得。另外,各种公开可用的数据库(例如GenBank,Gene Ontology和Entrez)提供对不同级别的生物组织,粒度和数据格式的基因相关信息的访问。该信息用于评估和解释高通量实验的结果。为了改进用于注释聚类和其他类型分析的关键字提取,我们开发了一种新颖的文本挖掘方法,该方法基于在基因注释语句(特别是表征生物学功能的语句)级别识别的关键字而不是整个摘要。此外,为了提高基因注释术语的表达性和实用性,我们研究了句子级关键字与医学主题词(MeSH)和基因本体论(GO)资源中术语的组合。我们发现,结合MeSH术语的句子级关键字比典型的“基准”设置(摘要级别的术语频率)大得多,而GO术语的添加仅在很小的程度上改善了问题。我们基于200个摘要的手动注释语料库对我们的方法进行了验证,该摘要是根据2个癌症类别和每个类别10个基因生成的。我们在从儿科脑肿瘤样本获得的三组差异表达基因的背景下应用了该方法。这项分析表明发现基因表达模式的新颖解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号